Search | VHL Regional Portal

Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.

Szarvas, Judit; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Thomsen, Martin Christen Frølund; Aarestrup, Frank M; Lund, Ole.

Commun Biol ; 3(1): 137, 2020 03 20.

Article in English | MEDLINE | ID: mdl-32198478

ABSTRACT

Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus sequences are generated. Nucleotide based genetic distance is calculated between the sequences in each set, and isolates are clustered together at 10 single-nucleotide polymorphisms. Phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are added back. The method is accurate at grouping outbreak strains together, while discriminating them from non-outbreak strains. The pipeline is applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating phylogenetic trees as needed.

Subject(s)

Bacteria/genetics , Computational Biology , DNA, Bacterial/genetics , Environmental Monitoring , Foodborne Diseases/microbiology , Online Systems , Phylogeny , Polymorphism, Single Nucleotide , Whole Genome Sequencing , Automation, Laboratory , Bacteria/classification , Bacteria/isolation & purification , Bacteria/pathogenicity , DNA, Bacterial/isolation & purification , Workflow

RUCS: rapid identification of PCR primers for unique core sequences.

Thomsen, Martin Christen Frølund; Hasman, Henrik; Westh, Henrik; Kaya, Hülya; Lund, Ole.

Bioinformatics ; 33(24): 3917-3921, 2017 Dec 15.

Article in English | MEDLINE | ID: mdl-28968748

ABSTRACT

MOTIVATION: Designing PCR primers to target a specific selection of whole genome sequenced strains can be a long, arduous and sometimes impractical task. Such tasks would benefit greatly from an automated tool to both identify unique targets, and to validate the vast number of potential primer pairs for the targets in silico. RESULTS: Here we present RUCS, a program that will find PCR primer pairs and probes for the unique core sequences of a positive genome dataset complement to a negative genome dataset. The resulting primer pairs and probes are in addition to simple selection also validated through a complex in silico PCR simulation. We compared our method, which identifies the unique core sequences, against an existing tool called ssGeneFinder, and found that our method was 6.5-20 times more sensitive. We used RUCS to design primer pairs that would target a set of genomes known to contain the mcr-1 colistin resistance gene. Three of the predicted pairs were chosen for experimental validation using PCR and gel electrophoresis. All three pairs successfully produced an amplicon with the target length for the samples containing mcr-1 and no amplification products were produced for the negative samples. The novel methods presented in this manuscript can reduce the time needed to identify target sequences, and provide a quick virtual PCR validation to eliminate time wasted on ambiguously binding primers. AVAILABILITY AND IMPLEMENTATION: Source code is freely available on https://bitbucket.org/genomicepidemiology/rucs. Web service is freely available on https://cge.cbs.dtu.dk/services/RUCS. CONTACT: mcft@cbs.dtu.dk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

DNA Primers , Polymerase Chain Reaction/methods , Software , Base Sequence , DNA Primers/chemistry

Correction: MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.

Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas.

PLoS One ; 12(6): e0179778, 2017.

Article in English | MEDLINE | ID: mdl-28604817

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pone.0176469.].

MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.

Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas.

PLoS One ; 12(5): e0176469, 2017.

Article in English | MEDLINE | ID: mdl-28467460

ABSTRACT

An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.

Subject(s)

Metagenomics/methods , High-Throughput Nucleotide Sequencing

WGS-based surveillance of third-generation cephalosporin-resistant Escherichia coli from bloodstream infections in Denmark.

Roer, Louise; Hansen, Frank; Thomsen, Martin Christen Frølund; Knudsen, Jenny Dahl; Hansen, Dennis Schrøder; Wang, Mikala; Samulioniené, Jurgita; Justesen, Ulrik Stenz; Røder, Bent L; Schumacher, Helga; Østergaard, Claus; Andersen, Leif Percival; Dzajic, Esad; Søndergaard, Turid Snekloth; Stegger, Marc; Hammerum, Anette M; Hasman, Henrik.

J Antimicrob Chemother ; 72(7): 1922-1929, 2017 07 01.

Article in English | MEDLINE | ID: mdl-28369408

ABSTRACT

Objectives: To evaluate a genome-based surveillance of all Danish third-generation cephalosporin-resistant Escherichia coli (3GC-R Ec ) from bloodstream infections between 2014 and 2015, focusing on horizontally transferable resistance mechanisms. Methods: A collection of 552 3GC-R Ec isolates were whole-genome sequenced and characterized by using the batch uploader from the Center for Genomic Epidemiology (CGE) and automatically analysed using the CGE tools according to resistance profile, MLST, serotype and fimH subtype. Additionally, the phylogenetic relationship of the isolates was analysed by SNP analysis. Results: The majority of the 552 isolates were ESBL producers (89%), with bla CTX-M-15 being the most prevalent (50%) gene, followed by bla CTX-M-14 (14%), bla CTX-M-27 (11%) and bla CTX-M-101 (5%). ST131 was detected in 50% of the E. coli isolates, with the remaining isolates belonging to 73 other STs, including globally disseminated STs (e.g. ST10, ST38, ST58, ST69 and ST410). Five of the bloodstream isolates were carbapenemase producers, carrying bla OXA-181 (3) and bla OXA-48 (2). Phylogenetic analysis revealed 15 possible national outbreaks during the 2 year period, one caused by a novel ST131/ bla CTX-M-101 clone, here observed for the first time in Denmark. Additionally, the analysis revealed three individual cases with possible persistence of closely related clones collected more than 13 months apart. Conclusions: Continuous WGS-based national surveillance of 3GC-R Ec , in combination with more detailed epidemiological information, can improve the ability to follow the population dynamics of 3GC-R Ec , thus allowing for the detection of potential outbreaks and the effects of changing treatment regimens in the future.

Subject(s)

Bacteremia/microbiology , Cephalosporin Resistance/genetics , Cephalosporins/pharmacology , Escherichia coli Infections/microbiology , Escherichia coli/drug effects , Escherichia coli/genetics , Genome, Bacterial , Anti-Bacterial Agents/pharmacology , Bacteremia/epidemiology , Bacterial Proteins/biosynthesis , Bacterial Proteins/genetics , Electrophoresis, Gel, Pulsed-Field , Epidemiological Monitoring , Escherichia coli/enzymology , Escherichia coli Infections/epidemiology , Gene Transfer, Horizontal , High-Throughput Nucleotide Sequencing , Humans , Microbial Sensitivity Tests , Multilocus Sequence Typing , Phylogeny , Polymerase Chain Reaction , beta-Lactamases/biosynthesis , beta-Lactamases/genetics

A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.

Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole.

PLoS One ; 11(6): e0157718, 2016.

Article in English | MEDLINE | ID: mdl-27327771

ABSTRACT

Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.

Subject(s)

Bacteria/genetics , Diagnostic Techniques and Procedures , Genome, Bacterial , Sequence Analysis, DNA/methods , Statistics as Topic , Algorithms , Bacteria/pathogenicity , Base Sequence , Plasmids/metabolism , Software , Species Specificity , Time Factors , Virulence/genetics

Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

Thomsen, Martin Christen Frølund; Nielsen, Morten.

Nucleic Acids Res ; 40(Web Server issue): W281-7, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22638583

ABSTRACT

Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

Subject(s)

Amino Acid Motifs , Position-Specific Scoring Matrices , Sequence Analysis, Protein , Software , Binding Sites , Computer Graphics , Internet , Sequence Alignment , User-Computer Interface

snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M.

BMC Genomics ; 13 Suppl 7: S6, 2012.

Article in English | MEDLINE | ID: mdl-23281601

ABSTRACT

BACKGROUND: The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. RESULTS: Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. CONCLUSIONS: The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.

Subject(s)

Bacteria/genetics , Genome, Bacterial , Polymorphism, Single Nucleotide , Bacteria/classification , Databases, Genetic , Internet , Mycobacterium tuberculosis/genetics , Salmonella typhimurium/genetics , Software , Staphylococcus aureus/genetics , User-Computer Interface , Vibrio cholerae/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL